Bayesian adaptive nearest neighbor
نویسندگان
چکیده
The k nearest neighbor classification (k-NN) is a very simple and popular method for classification. However, it suffers from a major drawback, it assumes constant local class posterior probability. It is also highly dependent on and sensitive to the choice of the number of neighbors k. In addition, it severely lacks the desired probabilistic formulation. In this article, we propose a Bayesian adaptive nearest neighbor method (BANN) that can adaptively select the shape of the neighborhood and the number of neighbors k. The shape of the neighborhood is automatically selected according to the concentration of the data around each query point with the help of discriminants. The neighborhood size is not predetermined and is kept free using a prior distribution. Thus, we are able to make the model to select the appropriate neighborhood size. The model is fitted using Markov Chain Monte Carlo (MCMC), so we are not using exactly one neighborhood size but a mixture of k. Our BANN model is highly flexible, determining any local pattern in the data-generating process, and adapting it to give an improved prediction. We have applied our model on four simulated data sets with special structures and five real-life benchmark data sets. Our proposed BANN method demonstrates substantial improvement over k-NN and discriminant adaptive nearest neighbor (DANN) in all nine case studies. It also outperforms the probabilistic nearest neighbor (PNN) in most of the data analyses. 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 92–105, 2010
منابع مشابه
Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification
This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion dec...
متن کامل5 Approximate Nearest Neighbor Regression in Very High Dimensions
Fast and approximate nearest-neighbor search methods have recently become popular for scaling nonparameteric regression to more complex and high-dimensional applications. As an alternative to fast nearest neighbor search, training data can also be incorporated online into appropriate sufficient statistics and adaptive data structures, such that approximate nearestneighbor predictions can be acc...
متن کاملAdaptive Nearest Neighbor Classifier Based on Supervised Ellipsoid Clustering
Nearest neighbor classifier is a widely-used effective method for multi-class problems. However, it suffers from the problem of the curse of dimensionality in high dimensional space. To solve this problem, many adaptive nearest neighbor classifiers were proposed. In this paper, a locally adaptive nearest neighbor classification method based on supervised learning style which works well for the ...
متن کاملAdaptive Metric nearest Neighbor Classification
Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chisqu...
متن کاملK-Nearest Neighbor Classification Using Anatomized Data
This paper analyzes k nearest neighbor classification with training data anonymized using anatomy. Anatomy preserves all data values, but introduces uncertainty in the mapping between identifying and sensitive values. We first study the theoretical effect of the anatomized training data on the k nearest neighbor error rate bounds, nearest neighbor convergence rate, and Bayesian error. We then v...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Statistical Analysis and Data Mining
دوره 3 شماره
صفحات -
تاریخ انتشار 2010